Olga Vechtomova and Ying Wang A study of the effect of term proximity on query expansion
نویسندگان
چکیده
Query expansion terms are often used to enhance original query formulations in document retrieval. Such terms are usually selected from the entire documents or from windows or passages surrounding query term occurrences. Arguably, the semantic relatedness between terms weakens with the increase in the distance separating them. In this paper we report a study that was conducted to systematically evaluate different distance functions for selecting query expansion terms. We propose a distance factor that can be effectively combined with the statistical term association measure of mutual information for selecting query expansion terms. Evaluation of the TREC collection shows that distanceweighted mutual information is more effective than mutual information alone in selecting terms for query expansion.
منابع مشابه
A study of the effect of term proximity on query expansion
Query expansion terms are often used to enhance original query formulations in document retrieval. Such terms are usually selected from the entire documents or from windows or passages surrounding query term occurrences. Arguably, the semantic relatedness between terms weakens with the increase in the distance separating them. In this paper we report a study that was conducted to systematically...
متن کاملLexical cohesion and term proximity in document ranking
We demonstrate effective new methods of document ranking based on lexical cohesive relationships between query terms. The proposed methods rely solely on the lexical relationships between original query terms, and do not involve query expansion or relevance feedback. Two types of lexical cohesive relationship information between query terms are used in document ranking: short-distance collocati...
متن کاملIntegration of Collocation Statistics into the Probabilistic Retrieval Model
The paper presents a method of combining corpus information on word collocations with the probabilistic model of information retrieval. Corpus term dependencies are used to modify the probabilistic retrieval based on the term independence assumption. Collocates are derived from windows around term occurrences in the corpus. Statistical measures of mutual information and Z score are applied to s...
متن کاملEnterprise Search: Identifying Relevant Sentences and Using Them for Query Expansion
In this paper, we discuss the experiments conducted in context of Document Search task of 2007 Enterprise Search track. Our method is based on selecting sentences from the given relevant documents and using those selected sentences for query expansion. We observed that our method of query expansion improves system’s performance over baseline run, under various methods of comparison.
متن کاملFacet-based opinion retrieval from blogs
The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback-Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurr...
متن کامل